Random Sampling from B+ Trees

نویسندگان

  • Frank Olken
  • Doron Rotem
چکیده

We consider the design and analysis of algorithms to retrieve simple random samples from databases. Specifically, we examine simple random sampling from B+ tree files. Existing methods of sampling from B+ trees, require the use of auxiliary rank information in the nodes of the tree. Such modified B+ tree files are called “ranked B+ trees”. We compare sampling from ranked Bt tree files, with new acceptance/rejection (A/R) sampling methods which sample directly from standard B+ trees. Our new A/R sampling algorithm can easily be retrofit to existing DBMSs, and does not require the overhead of maintaining rank information. We consider both iterative and batch sampling methods.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Random Sampling from Pseudo-Ranked B+ Trees

In the past, two basic approaches for sampling f5-om B+ trees have been suggested: sampling from the ranked trees and acceptance/rejection sampling i?om non-ranked trees. The first approach requires the entire root-to-leaf path to be updated with each insertion and deletion. The second has no update overhead, but incurs a high rejection rate for the compressed-key B+ trees commonly used in prac...

متن کامل

A Study on the Accuracy and Precision of Estimation of the Number, Basal Area and Standing Trees Volume per Hectare Using of some Sampling Methods in Forests of NavAsalem

   The present study aimed to investigate the accuracy and precision estimation of the number, basal area and volume of the standing trees by methods of random and systematic random sampling in the forests of West Guilan. The cost or inventory time was determined using the criteria (E%2 × T). Inventory was carried out by complete sampling (census) in an area of 52 hectares. The study area (sect...

متن کامل

Boolean Functions Fitness Spaces

We investigate the distribution of performance of the Boolean functions of 3 Boolean inputs (particularly that of the parity functions), the always-on-6 and even-6 parity functions. We us enumeration, uniform Monte-Carlo random sampling and sampling random full trees. As expected XOR dramatically changes the fitness distributions. In all cases once some minimum size threshold has been exceeded,...

متن کامل

Regenerative Tree Growth: Binary Self-similar Continuum Random Trees and Poisson–dirichlet Compositions1 by Jim Pitman

We use a natural ordered extension of the Chinese Restaurant Process to grow a two-parameter family of binary self-similar continuum fragmentation trees. We provide an explicit embedding of Ford’s sequence of alpha model trees in the continuum tree which we identified in a previous article as a distributional scaling limit of Ford’s trees. In general, the Markov branching trees induced by the t...

متن کامل

Random Sampling from Databases

Random Sampling from Databases by Frank Olken Doctor of Philosophy in Computer Science University of California at Berkeley Professor Michael Stonebraker, Chair In this thesis I describe e cient methods of answering random sampling queries of relational databases, i.e., retrieving random samples of the results of relational queries. I begin with a discussion of the motivation for including samp...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1989